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Abstract 

We define a notion of barycenter for random probability measures in the Wasserstein 
space. We give a characterization of the population barycenter in terms of existence and 
uniqueness for compactly supported measures. Then, the problem of estimating this barycen- 
ter from n independent and identically distributed random probability measures is consid- 
ered. We study the convergence of the empirical barycenter proposed in Agueh and Carlier 
[2 to its population counterpart as the number of measures n tends to infinity. To illustrate 
the benefits of this approach for data analysis and statistics, we finally discuss the usefulness 
of bary centers in the Wasserstein space for curve and image warping. 
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1 Introduction 

In this paper, we consider the problem of defining the barycenter of random probability measures 
on M. d . The set of Radon probability measures endowed with the 2- Wasserstein distance is not an 
Euclidean space. Consequently, to define a notion of barycenter for random probability measures, 
it is natural to use the notion of Frechet mean |13| that is an extension of the usual Euclidean 
barycenter to non-linear spaces endowed with non-Euclidean metrics. If Y denotes a random 
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variable with distribution P taking its value in a metric space (A4,dj^i), then a Frechet mean 
(not necessarily unique) of the distribution P is a point m* € M that is a global minimum of 
the functional 

J(m) = — / c^(m, y)dP(y) i.e. m* £ arg min J(m). 

In this paper, a Frechet mean of a distribution P will be also called a barycenter. An em- 
pirical Frechet mean of an independent and identically distributed (iid) sample Yi, . . . , Y n of 
distribution P is 

1 n 1 

Y n e arg mm-^2-d 2 M (m,Yj). 

For random variables belonging to nonlinear metric spaces, a well-known example is the com- 
putation of the mean of a set of planar shapes in the Kendall's shape space [23j that leads to 
the Procrustean means studied in |16| . Many properties of the Frechet mean in finite dimen- 
sional Riemannian manifolds (such as consistency and uniqueness) have been investigated in 
[H HI El [21] . For random variables taking their value in metric spaces of nonpositive curvature 
(NPC), a detailed study of various properties of their barycenter can be found in |27j . Recently, 
some properties of the Frechet mean in bounded metric spaces have also been studied in |15j . 
However, there is not so much work on Frechet means in infinite dimensional metric spaces that 
do not satisfy the global NPC property as defined in |27| . 

In this paper, we consider the case where Y = /x is a random probability measure belonging 
to the 2-Wasserstein space on M rf with distribution P. More precisely, we propose to study some 
properties of the barycenter /i* of fj, defined as the following Frechet mean 

fi* = arg min / -d^ 2 (z/,//)dP(//), (1.1) 



where A4+(M. ) is the set of Radon probability measures on R rf , and dyy 2 denotes the squared 
2-Wasserstein distance between two probability measures. Note that P denotes a probability 
distribution on the space of probability measures (M\(M. d ), B (M\(M. d )), where B (_M+(M d )) is 
the Borel cr-algebra generated by the topology induced by the distance dw 2 ■ If h exists and is 
unique, the measure fi* will be referred to as the population barycenter of the random measure 
fj, with distribution P. The empirical counterpart of fx* is the barycenter ji n defined as 



fi n = arg mm 



1 n 1 

v£M\(R d ) n j=1 z 

where fly,... , fJ, n are iid random measures sampled from the distribution P. A detailed charac- 
terization of f~L n in terms of existence, uniqueness and regularity, together with its link to the 
multi-marginal problem in optimal transport has been proposed in [2]. 

The first contribution of this paper is to discuss some assumptions on P that guaranty the 
existence and uniqueness of the population barycenter. These results are based on an adaptation 
of the arguments developed in [2] for the characterization of the empirical barycenter f~L n . In 
particular, we propose a dual formulation of the optimisation problem (jl.ip that allows a precise 
study of some properties of the population barycenter such as its uniqueness. Therefore, our 
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approach is very much connected with the theory of optimal mass transport, and with the 
characterization of the Monge-Kantorovich problem via arguments from convex analysis and 
duality, see |31| for further details on this topic. A second contribution of this paper is to study 
the convergence of fX n to fi* as the number n of measures tends to infinity. Finally, we show that 
this notion of barycenter of probability measures has interesting applications in various statistical 
models for data analysis. 

The paper is then organised as follows. In Section [2J we give a characterisation of the 
population barycenter in the case of compactly supported measures. The convergence of the 
empirical barycenter is discussed in Section [3] As an application of the methodology developed 
in this paper, we discuss in Section [4] the usefulness of barycenters in the Wasserstein for curve 
and image warping problems. Finally, we give a conclusion and some perspectives in Section [5l 

Throughout the paper, we use bold symbols Y, fx, G, . . . to denote random variables. 

2 Characterisation of the population barycenter of compactly 
supported measures 

2.1 Some definitions and notations 

The notation \x\ is used to denote the usual Euclidean norm of a vector x £ W 71 , and the 
notation (x, y) denotes the usual inner product for i,t/6 W 71 . Let $7 be compact set in M. d , and 
let 5(Q) = sup^ i2/ ) S q x q \x — y\ be its diameter. Let X = C(f2,R) be the space of continuous 
functions / : 0, — > K equipped with the supremum norm 



We denote by the space of bounded Radon measures on $7 and by + the set of Radon 

probability measures. Note that any v £ _A/f^(f2) can be considered as a probability measure 
on M. d having a compact support included in CI. In this section, we characterize the population 
barycenter of a specific class of random probability measures taking their values in Ai\(Cl). 

Let X' = M{Cl) be the topological dual of X. We recall that the squared 2-Wassertein 
distance between two probability measures fi, v £ is 



where II(//, v) is the set of all probability measures on 0, x 0, having /i and v as marginals, see 
e.g. [31] . We recall that 7 £ II(/x, u) is called an optimal transport plan between \i and v if 



Let T : O -> O be a measurable mapping, and let ^ £ A4(Q). The push- forward measure 
of fx through the map T is defined as the measure 



||/||x = sup{|/(x)|}. 





[ f(x)d(T#[i)(x) = [ f(T(x))dfi(x), for all / 



£ X. 



Jn Jn 
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We also recall the following well known result in optimal transport (see e.g. |31] or Proposition 
3.3 in 0): 

Proposition 2.1. Let fj,,u G J\A(£l). Then, 7 G n(/x, v) is an optimal transport plan between fi 
and v if and only if the support ofj is included in the set d(j) that is the graph of the subdifferential 
of a convex and lower semi- continuous function <j) solution of the problem 

4> = arg min < / ip(x)d/j,(x) + / ip*(x)dv(x) > , 
ifsc [Jn Jn J 

where ip*(x) = sup^Q {{x, y) — ip(y)} is the convex conjugate of ip, and C denotes the set of 
convex functions ip : £1 — > M that are lower semi- continuous 

Moreover, if fi admits a density with respect to the Lebesgue measure on M d , then there exists 
a unique optimal transport plan 7 G n(^, v) that is of the form 7 = (id, V(J))#fL where denotes 
the gradient of (p. The uniqueness of the transport plan holds in the sense that ifV(fi#fi = Vifj#fi, 
where ip : £1 — > M. is a convex function, then (f) = if; [i-almost everywhere. 

Proposition 12. II is the key ingredient in the proof of Theorem 12.11 fstated later on in this sec- 
tion) to show the uniqueness of a population barycenter. Let us finally recall that the Wasserstein 
space (A^^(O), dw 2 ) i s a compact metric space since is a compact subset of IR^. 



2.2 A parametric class of random probability measures 

Now, let us define a class of random probability measures belonging to Let be a 

compact subset of MP. Let 4> : (MP, B(W)) -> (M + (Sl), B (M\ (tt)) be a measurable mapping, 
where B(W) is the Borel <r-algebra of W and B (A^^(O)) is the Borel <r-algebra generated by 
the topology induced by the distance dw 2 - Then, let us define 

M^Q) = {fi e = <t>(0), 6 G 6} 

as the set of probability measures \iq G A4\_(Q.) parametrized by the mapping <fi and the compact 
set 0. Throughout the paper, we will suppose that (p satisfies the following assumption: 

Assumption 1. For any 6 G 0, the measure {j,q = <j)(9) G A4 + (Q.) admits a density with respect 
to the Lebesgue measure on MP. 

Let Pe be a probability measure on with density g : — ?■ R + with respect to the Lebesgue 
measure dO on MP. We assume that g satisfies the following regularity conditions: 

Assumption 2. The density g is L-Lipschitz for some constant L > i.e. 

\g(0i) - g(e 2 )\ <L\e x -e 2 \, foran y e x ,e 2 g 0. (2.1) 

If 6 G MP is a random vector with density g, then \xq = 4>(9) is a random probability measure 
with distribution P g on (A4^(£l),B (A^^fi)) that is the push-forward measure defined by 

F g (B) = Pe(0~ 1 (^)), for any B G B {M\(n)) . 



4 



As explained in the introduction, we want to characterize the barycenter (i.e. the Frechet 
mean) of the distribution ¥ g when J\A+(Q) is endowed with the 2-Wasserstein distance d\y 2 ■ F° r 
this purpose, let us consider the optimization problem: find 

fi* e argmin / 7;dw 2 (v, (j,)dF g (n) = arg min / \d 2 W2 {v, fig)g(9)d9. (2.2) 
ueM\ (n) Jm\{0) veM\(n) Je 1 

The main goals of this section are to prove the existence and the uniqueness of fi* . Since f2 is 
compact, it is obvious that d^^iv^io) < <5 2 (f2) and thus 

[ d 2 w (v^ e )g{e)de < 5 2 (n) < +00 for any v G M\(n). (2.3) 

2.3 Primal and dual formulations 

Consider the problem 

CP) J V := inf J{v) = \ I d 2 W2 (v,ne)g(0)d8. (2.4) 

To study the existence and uniqueness of fi* , let us introduce some definitions. The notation 
/ = (fe)ee& £ L l {Q,X) will denote any application 

f / e : e -> X 

such that for any 

/ |/„(x)|d0 < +00. 
Je 

Then, following the terminology in [2], we introduce the dual problem 

(V*) J P * := sup j I [ S g{e) f 8 (x)dne(x)d6; f @ e ^(6, X) such that f f e (x)d6 = 0, Vx e 1 , 
[JeJn Je J 

Sg(0)f(x):= inf |^|x-y| 2 -/(y)|,Vxefi and f & X. 
g(9)(f) '=- S g{e) f(x)dflg(x), 

Jn 

and the Legendre-Fenchel transform of Hp(0) as for 

H g(e)( u ) := sup <^J^f{x)dv{x) - j = sup ^J^f(x)du(x) + J^S g{e) f{x)dfi g (x)^ . 

In what follows, we will show that the problems (T 7 ) and (T 3 *) are related in the sense that the 
minimal value Jp in problem (V) is equal to the supremum Jp* in problem (V*), see Proposition 
2.21 below. We will also show that both problems have optimizers, see Proposition 12.31 below. 
This duality will then allow us to characterize the uniqueness of the population barycenter, see 
Theorem 12.11 



where 



Let us also define 
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Proposition 2.2. Suppose that Assumption^ and Assumption^ are satisfied. Then, 

J-p = Jp* . 

Proof. 1. Let us first prove that Jp > J-p*. 

By definition for any /® G L 1 (@,X) such that Vx G Q,, Jq fg(x)d9 = 0, and for all y G O we 
have 

^ {e) / e (x) + / e (y)<^|x- ? /| 2 . 

Let v G A^i.(H) and 751 G 11(^0, z/) be an optimal transport plan between fig and v. By integrating 
the above inequality with respect to -yg we obtain 

/ S g{e) f e {x)dng(x) + / fg{y)dv(y) < / — -|x - y^a^x, y) = — -W 2 (^e, ^ ). 

Integrating now with respect to d9 and using Fubini's Theorem we get 

/ / S g{e) f e (x)d(ig(x)d6 < [ ^lw 2 ( f i e ,u)de. 
Je Jn Je 1 

Therefore we deduce that Jp > Jp* . 

2. Let us now prove the converse inequalities Jp < Jp*. 

Thanks to the Kantorovich duality formula (see e.g. |31| . or Lemma 2.1 in [2]) we have that 
^9(9)^ = S^W^^' u )9(@) f° r an y v ^ A4 + (fi). Therefore, it follows that 

J P = ^{j e H* {e) {u)de, ueX'\ = -(J H* m dd\ (0). (2.5) 
Define the inf-convolution of {Hg(e))g e Q by 

H(f) :=w£lj H g{e) (fo)d9; / e G L X (Q, X), fg(x)d9 = /(x), Vx G fil , V/ G X 
We have in the other hand that 

Jp. = -H(0). 
Using Theorem 1.6 in [22], one has that for any v G 

H*{y) = J H* g{o) (y)d0. 

Then, thanks to (|2.5|) . it follows that 

J P = -H**(0) > -H(0) = J v *. 
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Let us now prove that H**(0) = H(0). Since H is convex it is sufficient to show that H is 
continuous at for the supremum norm of the space X (see e.g. |12|). For this purpose, let 
/ e E L l (Q,X) and remark that it follows from the definition of H g ig\ that 

H g{e)Ue) = ^ SU P - ^Y~\ x ~ 2/1 2 J" dix e {x) 

which implies that 

H(f) > /(0) - / ^ / |x| 2 ^(x)^ > -oo, V/ E X 

Let f £ X such that ||/|| x < 1/4 and choose f e E L l {Q,X) defined by fg(x) = f(x)g(6) for all 
9 E O and x £ 17. It follows that 

H(f) < J H m tf{-)g{0))M < j^j^i^- 9 -^\x-y\ 2 ^d^{x)de 

Hence, the convex function H never takes the value — oo and is bounded from above in a neigh- 
borhood of in X. Therefore, by standard results in convex analysis (see e.g. |12j). H is 
continuous at 0, and therefore H**(0) = H(0) which completes the proof. 

□ 

Let us now prove the existence of an optimizer for the primal problem (V) and its dual (V*) 
as formulated in the following proposition: 

Proposition 2.3. Suppose that Assumption^ and Assumption^ are satisfied. Then, both prob- 
lems (V) and (V*) admit an optimizer. 

Proof. 1. Let v n be a minimizing sequence of (V). Since Q is compact, the sequence J n \x\ 2 dv n (x) 
is uniformly bounded. Hence, by Chebyshev's inequality, the sequence v n is tight and by 
Prokhorov's Theorem there exists a (non relabeled) subsequence that weakly converges to some 
fi* E Since $7 is compact, it can be checked that v i— > d 2 ^ (fig, v) is lower semi-continuous 

(l.s.c.) on ,Mi.(n). Therefore by Fatou's Lemma 

/ ld 2 W2 (ne,V*)g(0)d9 = / liminf \d 2 W2 {fig,v n )g{9)d9 < liminf / \d 2 W2 {fig,v n )g{9)d9, 
Je 1 Je z Je 1 

and therefore J(v*) = inf^g^i ^ J e d^ 2 (v, fig)g(9)d9 , which proves that (V) admits a mini- 
mizer. 

2. Let / e E L l (Q, X) such that J" e fgd9 = and define hg(x) = S g /g\ o S g ^fg(x) for every x E 
and E 0. It is easy to check that fg(x) < hg(x) and that hg(x) < ^-\x\ 2 — S g ^fg(0). Hence, 
these two inequalities imply that 9 i— > hg E L 1 (0,X). Now, define fg = hg — Jq h u du for every 
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9 G 0. Since J Q h u du > J & f u du = 0, one has that fg < hg which implies that S g ^fg > S g ig\h$ 
since S g (Q) is order-reversing. Since S g ^hg = S^^fg it follows that S g ^hg > S g ^fg. Moreover, 
the inequality fg < hg implies that S g ^fg > S g ^hg which finally shows that S g ^fg = S g ^hg 
and therefore 



e Jn 



S g (g)fg(x)dng(x)d9 > 



e Jn 



S g {g)fe{x)diig{x)de. 



Hence, one may assume that the supremum in (V*) can be restricted to the f e G L x (0,X) 
satisfying fg = hg — J e h u du with S 2 ^hg = hg for every 9 G 0. Note that one may also assume 
that hg(0) = since the functional J e J n S g ^fg(x)d^g(x)d9 in problem (V*) is invariant when 
one adds to the fg's constants eg that integrate to zero namely J & cgd9 = 0. 

Now, let f &,n G L 1 (0,X) be a maximizing sequence for problem (V*) that can thus be 
chosen such that fg=h g l - j e h™du with hg = S g ^gg, = S g ^K^j and h r e l (0) = for every 
9 G 0. By using (f!TT]) that for any x, z G 0, and 0i, 2 G 



g(0i) 



< 



2 

0i) 



y\ 2 ) + 



g(0i) 9(02) 



y 



x-y\ 

2 



\g(0i) - g(9 2 )\ 



< 6(Q)g(e 1 )\x-z\ + 

< K max (\x — z\ , \9± 



L5 2 (Cl) 



2 

y\ + \z 



9\ — 02 1 



y\) 



with K = max ( ||p||oo7 LS 2 ^ 



Therefore, the function (x, 9) 1— > ^¥p-\x — y\ 2 — gg(y) is K-Lipschitz on the compact set 0, x 0. 
Thus, the function (x,9) 1— > ha(x) is also K-Lipschitz, since it is an infimum of X-Lipschitz 
functions, where K is a constant not depending on y G 0. Hence, it follows that for any x, z G 0, 
and any 0i , 02 G 



(2.6) 



\fW-fg(z)\ < \hl(x) - hl(z)\ + / \K(x) - K(z)\du 



e 



I du 



Hence, the functions (x, 9) 1— > fg(x) are X-Lipschitz (with K = K (l + J^duj) on 17 x 0. 
Moreover, using the fact that h'g(0) = 0, it follows that for any and any G 



\fe(x)\ = \fe(x) ~ fe(0)\ < K\x\ < KS(Q). 
Now, and let us define, the set 



(2.7) 



A 



One can check that A is a subset of C(f2 x 0,R). By inequalities (j2.6|) and (|2.7p . the set .A is 
equicontinuous and uniformly bounded. Thus, one can use the Ascoli-Arzela theorem to obtain 
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that there exists a subsequence of functions (x,9) i— > ff {x) that converge uniformly on $7 x 
to some (x, 9) i— > fg(x) G C(£l x 0,M), where (p{n) is an increasing sequence of positive integers. 
It is clear that J Q \fg(x)\d0 < +00. Moreover, by inequality (|2.7p and Lebesgue's dominated 

convergence theorem, one finally obtains that J fe(x)d9 = lim„^ +00 f /^ n ^(x)<i# = for 
every x G f2. Therefore, one has that / e = (fe)ee& £ -^(©j^O with J" e f 9 (x)d9 = for every 
x G £1 

Since S^g) is upper semi-continuous (u.s.c.) on X it follows that 
Urn sup Sg^f^ix) < inf d jlimsup - y| 2 - / e v(n) (y)^ I 

< m{^\x - y\ 2 - f e (y)\ = S g{e) f e {x). 



Using that • | 2 — S g ^fg^ n \-) is a non-negative function and given that the function (x, 6) 1— > 

^ip|x| 2 is integrable on 17 x with respect to the measure dfig(x)d9, Fatou's Lemma implies 
that 

- S g (e)fe(x)dfxg(x)de, 
Je JR d 



limsup / / S g t^fp >(x)dfi e (x)d8 < limsup Sgmrfp (x)d[x e (x)d9 

n Je JR d Je JR d n 



which shows that 

Jv = I \ Sg(g)fg(x)dlJ,o(x)d6, 

Je JR d 

and thus that /® is a maximizer of problem (V*) . 

□ 

2.4 Uniqueness of the population barycenter 

Let us now use the duality between problems (V) and (V*) to characterize more precisely the 
population barycenter. The following theorem is the main result of this section. 

Theorem 2.1. Suppose that Assumption^ and Assumption^ are satisfied. Then, the population 
barycenter [i* defined by (|2.2p exists and is unique. Moreover, the following statements are 
equivalent: 

1. The measure /i* is the unique minimizer of problem (V) 

2. If f® = (/6i)g g G L 1 (0,X) is a maximizer of problem (V*), then for every 9 G © such 
that g(9) > 

fx* = Votive (2.8) 
where (f)g : £1 — > ]R is the convex function defined by 

4>e{x) = ^|x| 2 - -^jSg( d) fg(x), for all ieO. 



9 



Proof. We proceed in a way that is similar to what has been done in [2] to characterize an 
empirical barycenter. In the proof, we denote by 

Q g = {9 € e : g(9) > 0} 

the support of g. 

Let / e S L 1 (0,X) be a maximizer of problem (V*). By Proposition 12.21 and Proposition |2~ 
it follows that there exists a minimizer /i* of (V) such that 

dw 2 (^ ',Ho)g{Q)dO =11 S g{e) fe(x)dfie(x)d8 



2 ./« 



e J9 



S g(e) fe(x)dfig(x)(W + / / f e (x)d f x*(x)d9, (2.9) 

using Fubini's theorem and the fact that Jq fg(x)d6 = for all x £ 17 to obtain the last equality. 
Thanks to the Kantorovich duality formula (see e.g. |31] . or Lemma 2.1 in [2]) we have that 



sup 

f€X Uli 



Sg(e)f(x)dng(x)+ / f(x)d/j,*(a 
Jn 

> I S m f e (x)dfie{x) + [ f e (x)dfi*(x). (2.10) 
Jn Jn 

Therefore by combining f)2.9[) and (|2.10p . we necessarily have that 

-d^ 2 {n* , ne)g{6) = I S g{e) f e {x)dne{x) + [ f s (x)dfi*(x), (2.11) 
z Jn Jn 

for every 6 € Q g . 

Now, let 7q € IL(fj,g, /j,*) be an optimal transport plan between \iq and fi* . By definition of 70 
and by (|2.1ip . one obtains that for every 9 S Q g . 

9(0) [ , ,2, , , <?W ,2 , * , 

1 Jnxn 1 

Sg(e)fe(x)dii e {x) + / f e (y)dfi*(y) 
Jn 

{Sg(0)fe(x) + Mv))d<y 9 (x,y). (2.12) 

Slxil 

Since ^p-\x — y\ 2 > S g ^fg(x) + fe(y) (by definition of S g ^fg(x)), equality (|2,12p implies that 

^—\x - y\ 2 = S g ( 9) fe{x) + fe(y), j e - a.e. , (2.13) 

where the notation 70 — a.e. means that the above equality holds for all (x, y) in a set Aq C f2 x Q 
of measure 70 (Aq) = 1. 
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It is not difficult to check that S g ^ {Sg(6)fe) > fe- Therefore, by equality (|2.13p one obtains 
that 



fe(y) = ~ y\ 2 ~ S g(d )fe{x) > S g{0) {S gW f e ) (y), j e 



and thus 

fe = S g ( 9) {S g (Q)fe) , fi* - a.e. , (2.14) 
for every 9 £ Q g . Thus, by the constraint that J & fg(x)d9 = for all x £ Q, one has that 

S g(6) i S 9(e)fe) (x)d9 = 0, fi* - a.e. (2.15) 



For every 9 S Q g , introduce the convex function <J)g defined by 

M x ) = \\ x ? ~ ^ffj S g(e)fe(x), (2.16) 
and its conjugate (j)* e defined by 

<f>o(v) = \\v\ 2 ~ -^0) S 9iO) (Sg(e)fe(y)) ■ 

Let us denote by 

d<j>o = {(x, y) G n x n : cp e (x) + (j>* e (y) = {x, y)} 

the graph of its subdifierential. Let (x,y) be in the support of the measure jg. By (|2.13p and 
(j2~T4"j) it follows that 

g(6)(x,y) = -S g(e) f e (x) + ^\x\ 2 -f e ( y ) + ^\y\ 2 

= g (e)M^-s g( e){s g{ e)fe)(y) + ^\y\ 2 = g(0)M^ + g(0We(y). (2.17) 

By equality (|2.17|) . it follows that if 9 £ Q g , then (x,y) € d(f>g, which shows that the support 
of 70 is included in d<j)g. Moreover, one can check that if 9 € Q g , then <pg is the solution of 

cf>g = arg min < / 4>(x)dfig(x) + / cf)* (x)dfi* (x) >, (2.18) 
(f>€C IJn Jn J 

where C denotes the set of convex functions eft : f2 — > R that are lower semi-continuous. 

Thanks to Assumption [H the measure fig admits a density with respect to the Lebesgue 
measure for every 9 £ Q. Then, let us recall that we have shown previously that, if 9 £ @ g , then 
the support of the optimal transport plan jg between fig and fi* is included in d<pg. Hence, by 
Proposition 12.11 it follows that there exists a unique convex function (j)g : £1 — > R, solution of the 
optimisation problem (|2.18p . such that 

fi* = V(/>g#fig (2.19) 

for every 9 in the support Q g of g. Since the convex function (fig is defined by the equation (|2.16l) . 
it is clear that (j)g does not depend on fi* but only on fg and g{9) for 9 £ Q g . Therefore, by 
equation (|2.19p . the population barycenter ft* is necessarily unique, which completes the proof 
of Theorem 12.11 

□ 
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3 Convergence of the empirical barycenter 

Let us now prove the convergence of the empirical barycenter for the set of compactly supported 
measures introduced in Section [2 Let 9\, . . . , 6 n be iid random variables in R p with distribution 
P©. Then, let us define the functional 

n 1 

i=i 

and consider the optimization problem: find an empirical barycenter 

I n I 

fi n £ argmin J n (v) = arg min - V) -d^ 2 (z/, /zg .), (3.2) 
ueM\(n) veM\(Q) n j=1 1 3 

Thanks to the results in [2], the following lemma holds: 

Lemma 3.1. Suppose that Assumption^ holds. Then, for any n > 1, there exists a unique 
minimizer fi n of J n {') over 

Let us now give our main result on the convergence of the empirical barycenter ji n . 

Theorem 3.1. Suppose that Assumption^ and Assumption^ hold. Let fi* be the population 
barycenter defined by (|2.2p . and \i n be the empirical barycenter defined by (|3.2p . Then, 

lim dyy 2 (fi n , pf) = almost surely (a.s.) 

n— >+oo 

Proof. Some part of the proof is inspired by the proof of Theorem 1 in [15]. For v S let 
us define 

A n (» = J n (u) - J(u). 

The proof is divided in two steps. First, we prove the uniform convergence to zero of A n over 
M.+(£l). Then, we show that any converging subsequence of /2 n converges a.s. to [i* for the 
2-Wasserstein distance. 

Step 1. For v G let us denote by f v : .A/f+(0) — > R the real-valued function defined by 

Then, let us define the following class of functions 

Since ft is compact with diameter 5(Q), J 7 is a class of functions uniformly bounded by ^<5 2 (S1) 
(for the supremum norm). Now, let u,fj,,[i' G A4^_(Q). By the triangle reverse inequality 

\Mfi) - U(fJ>')\ = 2 l^a^'A*) - <%r a ( v ,rf)\ < *(^) \dw a (v,n) ~ d Wi {u,ii')\ 
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The above inequality proves that J- is an ecjuicontinuous family of functions. Now, let 0\ y . . . , n 
be iid random vectors in MP with density <?, and let us define the random empirical measure on 

^ n 1 n 



re ' — ' c/j n 



where <L denotes the Dirac measure at v = /i. It is clear that 



JM\{Ct) JM\(p.) 

Let / : M\($l) — > K be a real- valued function that is continuous (for the topology induced by 
dw 2 ) an d bounded. Thanks to the mesurability of the mapping <j), one has that the real random 
variable j Ml f(fj,)dFg(fj,) converges a.s. to J^i f(fj,)dF g (Li) as n — )• +00, meaning that the 
random measure P™ a.s. converges to P 5 in the weak sense. Therefore, since J 7 is a uniformly 
bounded and equicontinuous family of functions, one can use Theorem 6.2 in |25| to obtain that 



sup |A n (z/)| = sup 



as n — > +00, a.s. (3.3) 



which proves the uniform convergence of A n to zero over A4\(fl). 



Step 2. Suppose that Assumption [T] and Assumption [2] hold. By Lemma 13.11 there exists a 
unique sequence (fJ> n ) n >l of empirical barycenters defined by (|3.2p . Thanks to the compactness 
of the Wasserstein space dw 2 ), one can extract a converging sub-sequence of empirical 

barycenters (jjb n ,)fc>i such that limfc_> +00 dw 2 {f l n k i A) = f° r some measure fi € ^W^j_(Q). 

Let us now prove that fx = fi* . To this end, let us first note that by the definition of p, n and 
H* as the unique minimizer of J nk {') and J(-) respectively, it follows that 

| J (AnJ - J (/^)| = J (AnJ - Jn k (P-n k ) + Jn k (P>n h ) ~ ^n fe (^*) + ^n fc (^*) ~ ^(M*) 

< 2 sup |A nfc (z/)|, 

where we have used the fact that J„ fc (/i nfc ) — J nfe (/i*) < 0. Therefore, thanks to the uniform 
convergence (|3,3p of A n to zero over A4\(Cl), one obtains that 

lim JfoJ = J( M *). (3.4) 
Therefore, using that 

| J « fc (A„J - ^V)| < \ J n k (P-n k ) ~ J (An.j| + | J (An J ~ J(P*)\ 

< sup |A n( ,(z/)| + |j(/[i n J- J(ja*)\, 
ueM\(Q) 

one finally obtains by fj3.3j) and (|3.4p that 



lim Jn k (f*n k ) = J(j**)- (3-5) 
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Since \J nk {n) — J(/x)| < sup^g^i ^ |A nfe (z/)|, it follows by equation (|3,3p that 

lim J nk {jx) = J{fi) a.s. (3-6) 

k— s-+oo 

Moreover, for any e > 0, there exists k t G N such that dw 2 [fX n , A) — 6 f° r an k > k e . Therefore, 
using the triangle inequality, it follows that for all k > k e 

/ 1 nfc 1 
(^n fe (A)) 1/2 = I" ^2^ 2 ^'^ 

/ \ I/ 2 / \ V 2 

/ 1 Hk 1 _ \ / 1 ™ fe 1 

/i nfc i V /2 

and thus by equations fj3.5j) and (|3.6p . we obtain that 

J(/x)< lim J n (/i ) = J(u*) a.s. (3.7) 

k— >+oo 

which finally proves that fx = fi* a.s. since n* is the unique minimizer of J(v) over v £ A^^(O). 

Hence, any converging subsequence of empirical barycenters converges a.s. to u* for the 
2-Wasserstein distance. Since (A4\_(Q), dw 2 ) i s compact, this finally shows that (/i n ) n >i is a 
converging sequence such that lim n ^ +00 dw 2 {f L m A**) = a - s - > which completes the proof of 
Theorem 13.11 

□ 



4 Application to statistical models for curve and image warping 

In this section, we present some statistical models for which the notion of population and em- 
pirical barycenters in the 2-Wasserstein space are useful. 

In many applications observations are in the form of a set of n gray-level curves or images 
Xi,...,X n (e.g. in geophysics, biomedical imaging or in signal processing for neurosciences), 
which can be considered as iid random variables belonging to the set L 2 (il) of square-integrable 
and real- valued functions on a compact domain of Q of M. d . In many situations the observed curves 
or images share the same structure. This may lead to the assumption that these observations are 
random elements which vary around the same but unknown mean pattern (also called reference 
template) . Estimating such a mean pattern and characterizing the modes of individual variations 
around this template is of fundamental interest. 

Due to additive noise and geometric variability in the data, this mean pattern is typically 
unknown, and it has to be estimated. In this setting, a widely used approach is Grenander's 
pattern theory \17\ \18\ \29\ [30] that models geometric variability by the action of a Lie group on 
an infinite dimensional space of curves (or images). Following the ideas of Grenander's pattern 
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theory, a simple assumption is to consider that the data Xi, . . . , X n are obtained through the 
deformation of the same reference template h G L 2 (0) via the so-called deformable model 

Xi =ho l p7 1 , i = l,...,n, (4.1) 

where <p l7 . . . ,(p n are iid random variables belonging to the set of smooth diffeomorphisms of 
0. In signal and image processing, there has been recently a growing interest on the statistical 
analysis of deformable models (|4.ip using either rigid or non-rigid random diffeomorphisms cp it 
see e.g. [31 El El EE [TUl HH [32] and references therein. In a data set of curves or images, one 
generally observes not only a source of variability in geometry, but also a source of phomotometric 
variability (the intensity of a pixel changes from one image to another) that cannot be only 
captured by a deformation of the domain £1 via a diffeomorphism as in model (|4.ip . 

It is always possible to transform the data Xi, . . . , X n into a set of n iid random probability 
densities by computing the random variables 

Yj(x) = ^= du, x G SI, where Xj(x) = Xj(x) — min {Xj(«)} , i = 1, . . . ,n. 

/ n Xi(«) «en 

Let q G L 2 (0) be a probability density function, and consider the deformable model of densities 

Y i (x) = \det(Dcp- 1 )(x)\q(<pr\x)) x G O, i = 1, . . . , n, (4.2) 

where det (-D^^ -1 ) (x) denotes the determinant of the Jacobian matrix of the random diffeomor- 
phism ip^ 1 at point x. If we denote by /x 1; . . . ,/x n G the random probability measures 
with densities Yi, . . . , Y n , and by /i the measure with density q, then (|4.2p can also be written 
as the following deformable model of measures 

Mi = <A#/- f , i = l,...,n. (4.3) 

In model (|4.3p . computing the empirical barycenter in the Wasserstein space of the random 
measures fj, 1 , . . . , fi n may lead to consistent and meaningful estimators of the reference measure 
\x and thus of the mean pattern q. In the rest of this section, we discuss some examples of 
model ()4.3p . In particular, we show how the results of Section [2] can be used to characterise the 
population barycenter of random measures satisfying the deformable model (|4.3|) . 

4.1 A parametric class of diffeomorphisms 

Let ii be a measure on M. d having a density q (with respect to the Lebesgue measure dx on M. d ) 
whose support is contained in compact set fi 9 C R''. We propose to characterise the population 
barycenter of a random measure fi satisfying the deformable model 

fi = <p#(i, (4.4) 

for a specific class of random diffeomorphisms if : M. d — > Mr. Let §^~(R) be the set of non-negative 
definite d x d symmetric matrices with real entries. Let 

?)) -> (S+(R) x R d ,B fst(R) x I 
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be a measurable mapping, where B(S£(R) x R d ) is the Borel <r-algebra of Sj(R) x R d . For 
9 G R p , we will use the notations 

4>(9) = (Ag,b g ) , with A e G §+(R), 6 e G R d , 

and 

V9 e (x) = A x + b e , x G R d . 
For any 9 £ R d , one has that ipg : R d — > R d is a smooth and bijective affine application with 

<pj 1 (x) = Aj 1 (x-b e ), x G R d . 

Let 6 C K p be a compact set. One can then define a parametric class of diffeomorphisms of R d 
R d as follows 

L>^(9) = We, 9 G 9}. (4.5) 

Finally, let G MP be a random vector with density g (with respect to the Lebesgue measure d9 
on R p ) having a support included in the compact set 9. We propose to study the population 
barycenter in the 2-Wasserstein space of the random measure [1q satisfying the deformable model 

M0 = <PO#V» ( 4 -6) 

For any G 9 (not necessarily a random vector), we define fj,g = (pgftfj,. Since ipg is a smooth 
diffeomorphism and \i is a measure with density q whose support is included in the compact set 
Q q , it follows that \ig admits a density qg on R d given by 

(x) = { det (Ag 1 ) q {A" 1 (x - b e )) if x G ft(^), 

\ if X ^ 1Z(tfg). 

where lZ((fe) = { i Pe(y),y £ fig} = {Agy + bg,y G fig}. Before stating our main result on the 
population barycenter of the random measure [iq (|4.6p . let us make the following regularity 
assumption on the mapping <j). 

Assumption 3. The mapping <p : 9 — > §1"(R) x R d is continuous. 

Under Assumption [31 it follows that there exists a compact set fi C R d such that 1Z(tpg) C fi 
for all 9 G 9. Thus, under this assumption, the random measure \iq takes its values in M\.(fl). 

4.2 Characterization of the population barycenter for parametric diffeomor- 
phisms 

Let us now give a characterization of the population barycenter of a random measure following 
the deformable model (|4.6p with random diffeomorphism ipg taking their value in the parametric 
class defined by (|4.5p . Before stating the main result of this section, we define, for any 9 G 9, 
the following quantities 

Ag = AgA" 1 and bg = bg - AgA^b, (4.8) 

where A = E (Aq) and b = E (bg) . 
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Theorem 4.1. Let 6 G R p be a random vector with a density g : — )■ M that is continuously 
differentiable and such that g(9) > for all 9 € ©. Let fiQ be the random measure defined by the 
deformable model (|4.6p . Suppose that Assumption\^holds. 

Then, the population barycenter fi* defined by (|2,2p exists and is unique. Moreover, let us 
define the density 

q*{x) = det{A- 1 )q{A- l {x - &)), ieH, (4.9) 

where A = E (Aq), b = E (bg), and Aq, \>q are the random variables defined by (|4.8p . Then, 
the following statements hold: 

1. The primal problem (V) satistifies 

J v = inf J{v) = \ [ d 2 W2 {^,^)g{6)dO = \ [ E(\A e u + b e - U \ 2 )q*( U )d U , (4A0) 
ueM\(ct) £ Je z Jn 

and the dual problem (V*) admits a maximizer at f = (fe)ee& G L l {Q,X) where, for 9 E 0, 

fe(x) = -^({A e - /) x,x) - g(9)(b ,x), x E n, (4.11) 
where I is the d x d identity matrix. 

2. The population barycenter is the measure fj,* € M\.(Q) with density q* (with respect to the 
Lebesgue measure on W d ) given by equation \4-9[ 

Proof. Under the assumptions of Theorem 14. 1| it is clear that Assumption [1] and Assumption [2] 
are satisfied. Therefore, by Theorem (|2.ip . there exists a unique population barycenter jjl* of the 
random measure [1q defined by (|4.6p . To prove the results stated in Theorem 14. 1| we will use 
the characterization (|2.8p of the barycenter \x* . For this purpose, we need to find a maximizer 
/© = (f e ) $ee e L 1 (@,X) of the dual problem (V*). 

Let A = E \Aq) and b = E {bg) . By defining the density 

q(x) = det(^- 1 )g(i" 1 (x - b)), x G R d , 
one can re-parametrize the density qg, given by (|4.7|) for any 6 G 0, as follows 

<w(x) = det (i^ 1 ) 9 (i^ 1 (x - be)) , x G R d , (4.12) 

where 

= A e A~ l and b e = bg - AqA^I. 



Aq and 6^ are such that 



In the proof, we will denote by Qq the support of the density q. Note that the random variables 
uch that 

E (Aq) = I A e g(9)d9 = 1 and E (in) = [ 
Je Je 
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where / denotes the d x d identity matrix. 



Proof of statement 1 . of Theorem 14,11 

a) Let us first compute an upper bound of Jp*. Let /® £ L 1 (0,X) be such that Jq fg(x)d9 = 
for all x € Q. By definition of S g ^fg(x) one has that 

S g{ e)fe(x)<^-\x-y\ 2 -fg(y) (4.13) 

for any y £ Q. By using equation ()4.12p and inequality f|4. 13[) with y = Ag 1 (x — bg) one obtains 
that 



// 

Jo Jn 



S m f e (x)q e (x)dxd9 < J e J n \^§-\ x ~ A e' { x ~ h) ? ~ fe {Ag 1 {x -b e ))\ qg(x)dxd6 

< f_J (^\A u + b e -u\ 2 ^q(u)dude 

Note that to obtain the second inequality above, we have used the change of variable u = 
Ag (x — bg) , while the third inequality has been obtained using with the fact that J fg (it) dO = 
for any u £ ttq combined with Fubini's theorem. Thanks to the compactness of and £lq, and 
using Assumption [21 it follows that 



/ E (\Agu + bQ - u\ 2 ) q(u)du < +oo. 



Therefore, we have shown that 

J v * < - [ E (\A u + b e - u\ 2 ) q(u)du. (4.14) 

b) Let us recall that we have assumed that g{9) > for any 6 € 0. Now, for 9 G we define 
the function 

fe(x) = -^({Ag - I) x,x) - g(6)(bg,x). 

First, one can note that f® = (fe)g^Q belongs to L l (Q,X). Since J & Agg(0)dO = I and 
J@ bgg{6)d6 = 0, one has also that J* fg{x)d0 = 0. Let us now consider the function F = R d — > R 
defined as 

F(y) = 9 -^\x - y\ 2 + ^({Ag - I) y, y) + g(9)(bg,y), y G R d . 

Searching for some y G R d , where the gradient of F vanishes, leads to the equation 
= -g(0)(x-y)+g(e)((Ag-l)y + bg) =-g(9)x + g(e)(Agy + bg). 
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Hence, the function y i— > F(y) has a minimum at y = A g 1 (x — bg) . Therefore, 

S g{ e)fe{x) = M\ x -A^(x-bg)\ 2 + ^({Ae-l)(A^(x-b e )),Af(x 



+g(9)(bg,A~ 1 (x-b )) 

^N 2 - g(e)(x, (x -b 9 ))+ 9 -^\A^ (x - b 9 ) f 



(x-bg)) 

= ^N 2 -^(x,^- 1 (--^)> + ^(^^- 1 (--^)) (4-15) 
= M \ x - V (x - 5j) I 2 + ^(x + fts, V (x - b e )) 

-^p-lAe 1 {x-bg) I 2 (4.16) 

Let us introduce the notation J* (/ ) = J /q S g ^fg(x)dfig(x)d9. By equation (|4.16p and using 
the re-parametrization (|4.12p of qg combined with the change of variable u = Ay 1 (x — bg) , it 
follows that 

J *(/ 9 ) = / [^\x-A^(x-bg)\ 2 qg(x)dxd9+[ [ ^ (x + bg, Ag l (x - bg))qg(x)dxd9 

Je Jn 1 Je Jn 1 




e Jn 




9 -^\A~ l (x-bg)\ 2 q e (x)dxd9 

9 ^-\A e u + bg -u\ 2 q(u)dud9 + f [ ^-(Agu + 2b e ,u)q{u)dud9 
2 Je Jn q 2 

9 ^-\u\ 2 q{u)dud9 

- / E (\Aqii + bg — u\ 2 ) q(u)du, 
1 Jn„ 




where we have used Fubini's theorem combined with the fact that J Q A g(9)d9 = I and J Q 
to obtain the last equality. 

Hence, thanks to the upper bound (|4.14p . we finally have that 

J* (/ e ) =J V *=\[ E (\A e u + b e - u\ 2 ) q(u)du, 
1 Jn cl 

which proves that / e is a maximizer of the dual problem (V*), and this completes the proof of 
statement 1. of Theorem 14.11 

Proof of statement 2. of Theorem 14.11 
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Since we have found a solution / = (fe)eee of the dual problem (V*), it follows from 
Theorem 12.11 that the population barycenter is given by jjl* = V(j)g^fig where 

M x ) = \\ x \ 2 ~ -^ S g(6)fe(x), for all x £ SI, 

for every 9 E 0. By equation (|4,15p . one has that 

M x ) = ^(ziAg 1 ( x ~ b e )) - ^(be,!' 1 (x - b e )) = -(x - be,!' 1 (x - bg)), 
which implies that 

V0 fl = A" 1 (x - b ) . 

Since \xe is the measure with density q$(x) = det (Ag 1 ) q (Ag 1 (x — be)), one finally has that 
that n* is a measure having a density q* given by 

q*(x) = det(A 9 )qe (A e x + be) = q{x), 

which completes the proof of statement 2. of Theorem 14.11 



□ 

Hence, by Theorem 14. 11 the population barycenter \i* in the deformable model (|4.6p is related 
to the template measure by the equation 

p* = jpft/j, with Tp(x) = E {Aq) x + E (bg) , xeR d . 

Note that the mapping Jp is the expectation of the random diffeomorphism (pg(x) = Aqx + 
bg, x E M. d . Therefore, computing the population barycenter in the Wasserstein space of a 
measure from the deformable model (|4.6p with such random diffeomorphism amounts to transport 
the template measure \x by the averaged amount of deformation measured by Tp. In the case where 
Tp = I is the d x d identity matrix (which correspond to the assumption that E (^4fl) = / and 
E (bg) = 0), the population barycenter [i* is equal to the template measure fi. 



4.3 The case of randomly shifted densities 

To illustrate Theorem 14.11 let us consider the simplest deformable model of randomly shifted 
curves or images with 

<p- 1 (x)=x-O i , xER d , 

in equation (|4.ip for some random shift Oi E R d . This model has recently received a lot of 
attention in the literature, see e.g. El [101 HH [32], since it represents a benchmark for the 
statistical analysis of deformable models. In the one-dimensional case (d = 1), the model of 
shifted curves has applications in various fields such as as neurosciences |28| or biology [26]. 

Let q : W 1 — > M + be a probability density function with compact support included in [—A, A] d 
for some constant A > 0. For 8 a random vector in M. d , we define the random density 

qg{x) = q(x - 9), x E R d , (4.17) 
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and the associated random measure 



d/xg(x) = qg(x)dx. 

Note that equation (|4.17p corresponds to the deformable model (|4.6p with <pg(x) = x + 6, x £ M. d . 

Now let us suppose that has a continuously differentiable density g with compact support 
O = [— e, e] d for some e > 0. If 0±, . . . , n is an iid sample of random shifts with density g, 
then the empirical Euclidean barycenter (standard notion of averaging) of the random densities 
qg , . . . ,qg is the probability density given by 

1 n 

^h-E^o*)- ( 4 - 18 ) 

3=1 

By the law of large number, one has that 

lim q n (x) = / q(x — 9)g(9)d9 a.s. for any x E M. d . 

Therefore, the Euclidean barycenter q n converges to the convolution of the reference template q 
by the density g of the random shift 6. Hence, under mild assumptions, q n is not a consistent 
estimator of the mean pattern q. 

Let us now see the benefits of using the notion of empirical barycenter in the 2-Wasserstein 
space to consistently estimate q. It is clear that the set of shifted measures (fig)g^@ with densities 
qg(x) = q(x — 6) is included in A4+(Q,) with Q, = [— (A + e), (A + e)] d . Hence, Assumption [1] and 
Assumption [2] are satisfied. It is also clear that that the mapping <j) : 6 Sj(R) x R d defined 

by 

(f)(9) = (1,9), 9 £ 0, where / is the d x d identity matrix, 

is continuous, and thus Assumption [3] holds. Therefore, by Theorem 14. 1| one immediately has 
the following result: 

Corollary 4.1. Suppose that 9 is random vector in M. d having a continuously differentiable 
density g (with respect to the Lebesgue measure d9 on M. d ). Assume that g has a compact support 
= [— e,e] d for some e > 0. Let fj,g be the random measure with density qg(x) = q(x — 6) 
(with respect to the Lebesgue measure dx) where q : M. d — > IR + is probability density function with 
compact support included in [— A, A] d . 

Then, the population barycenter /i* in the 2- Wasserstein space exists and is unique. It is the 
measure with density q(x — E(#)), namely 

dfi*(x) = q(x-E(6))dx. 

The primal problem (V) satistifies 

J V = inf J{y) = \ [ d 2 W2 (ii*,M)g(9)d9 = \-E(\e-E(e)\ 2 ), (4.19) 
veM\(Q) 2 J e 2 

Moreover, f e = (f e ) eee G ^(Q^X), with 

fe(x) = -g(9)(9-E(0))x, (4.20) 
is a maximizer of the dual problem (P*). 
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Hence, if it is assumed that the random shifts have zero expectation i.e. E(0) = 0, then the 
density of the population barycenter fi* is the reference template q. In this setting, thanks to 
Theorem 13. 1\ the empirical barycenter /x n in the 2-Wasserstein space of the randomly shifted 
densities qg^, . . . , qQ is a consistent estimator of q. This illustrates the advantages of using the 
notion of barycenter in the Wasserstein space rather than the Euclidean barycenter q n , defined 
in (|4.18p . which may yield to non-consistent estimators of a mean pattern. 

4.4 Related results in the literature 

In the literature, there exists various applications of the notion of an empirical barycenter in the 
Wasserstein space for signal and image processing. For example, it has been successfully used 
for texture analysis in image processing |24| . The theory of optimal transport for image warping 
has also been shown to be usefull in various applications, see e.g. |19| [20] and references therein. 

Some properties of the empirical barycenter /2 n (in the 2-Wassertein space) of random mea- 
sures satisfying a deformable model similar to (j4.3|) have been studied in For a specific 
set of measurable maps {<Pi)i=i n that is an admissible family of deformations (in the sense of 
Definition 4.2 in it has been shown in that the empirical barycenter fi n of the mea- 
sures n i = (p^n, i = 1, . . . ,n converges almost surely to the template /x as n — > +oo for the 
dw 2 distance under the additional assumptions that the expectation of the (f^s is equal to the 
identity and in the case where the measure [i is compactly supported (see Theorem 4.4 in 
Therefore, the results in are consistent with those obtained in Section [3] of this paper on 
the convergence of the empirical barycenter of random measures that are compactly supported. 
Nevertheless, the results that we have obtained in this paper are more general than those in 
since our study on the consistency of the empirical barycenter is not restricted to the deformable 
model (|4.3p with measurable maps (Vj)i=i,...,n being in the so-called class of admissible deforma- 
tions that is introduced in Moreover, the problem of proving the existence and the unicity 
of a population barycenter of a random measure, as defined by (jl.ip . is not considered in 
In this paper, we have also shown the benefits of considering the dual formulation (V*) of the 
(primal) problem (|2.2p to characterize the population barycenter in the 2- Wasserstein space for 
a large class of deformable models of measures. 

5 Beyond the compactly supported case 

To conclude the paper, we briefly discuss the case of a random measure fi whose support is not 
included in a compact set Q of M d . 

Firstly, let us consider the one-dimensional case i.e. d= 1. Let Ml(R) be the set of Radon 
probability measures v on M having a finite second order moment (i.e. L \x\ 2 dv(x) < +oo). For 
v G A^^_(]R), we denote by F v its cumulative distribution function, and by F~ x its generalized 
inverse (quantile function). In the one-dimensional case (d = 1), it is well known that 

dw 2 (y,ti= f 1 \F-\y)-F- 1 (y)\ 2 dy 
Jo 

for any v and fi belonging to .M^(IR). Hence, if fj, denotes a random variable with distribution 
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taking its values in one has that for any v € M\(l 



E(d 2 W2 (is,fi)) 



Mi 



dy 



\F- 1 (y)-F- 1 (y)\ 2 dy)dP( t i) 



> 



E(£\F- 1 (y)-F^(y) 
j\\F-\y)-F^{y) ] \d y 
EiJ^E^iy^-F^^dyY 



(5.1) 



by applying Fubini's theorem, and by using the inequality E |x - X| > E |E(X) — X| that 
holds for any random variables X E R, and for any x£l, Hence, if one can define a measure 

H* G M\(R) such that F~}(y) = E (i 7 ^ 1 ^)) for au V G [°> 1]; then it is clear from inequality 
flED that 

f <%y a (v,n)dP(ji) > [ d^ a (ji*,fi)dP(ji), foranyz^G M< 
Jm 2 + (k.) jm 2 + (m) 

implying that /i* is a population barycenter of the random measure /x with distribution P. Under 
additional assumptions on fi (e.g. if it admits a density with respect to the Lebesgue measure 
on M) then one can show that such a /i* exists and is unique. Therefore, in the one-dimensional 
case, extending some of our results to random measures that are non-compactly supported is 
certainly not too difficult. 

The multi-dimensional case (i.e. d > 2) is more involved. Indeed, the arguments that we used 
to prove the existence of an optimizer of the dual problem (V*) as well as those used to show the 
convergence of the empirical barycenter to its population counterpart strongly depend on the 
compactness assumption for the support of the random measure Adapting these arguments to 
non-compactly supported measures to study the dual problem (V*) and to show the consistency 
of the empirical barycenter is an interesting topic for future investigations. 
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